摘要 :
Execution process in modern Web applications is usually represented as a partially ordered sequence of basic actions issued by a client (login, buy, exit, etc.; the login action usually precedes purchasing). Based on these actions...
展开
Execution process in modern Web applications is usually represented as a partially ordered sequence of basic actions issued by a client (login, buy, exit, etc.; the login action usually precedes purchasing). Based on these actions, a finite automaton of fine-grained authorization checks, may be specified in a separate layer that is easily configurable for security needs of a particular application. In the Mobile case there may be two such state machines - one performing state-based authorization checks of the application execution process and the other performing such checks for the mobile agent execution process. Authorization checks of these machines may be both state-based and policy based, and the policies should distinguish between human clients and mobile agents cases. We develop the framework to specify and enforce finegrained state-based authorization checks of Web application execution, consisting of a Web browser (client) and a server. We adopt this framework to the Mobile case so that state machines representing finegrained authorization checks of application and mobile agent execution are synchronized.
收起
摘要 :
We present an algorithm called the Best Trail Algorithm, which helps solve the hypertext navigation problem by automating the construction of memex-like trails through the corpus. The algorithm performs a probabilistic best-first ...
展开
We present an algorithm called the Best Trail Algorithm, which helps solve the hypertext navigation problem by automating the construction of memex-like trails through the corpus. The algorithm performs a probabilistic best-first expansion of a set of navigation trees to find relevant and compact trails. We describe the implementation of the algorithm, scoring methods for trails, filtering algorithms and a new metric called potential gain which measures the potential of a page for future navigation opportunities.
收起
摘要 :
So far, conceptual modeling of Web applications has been used primarily in the upper part of the life cycle, as a driver for system analysis. Little attention has been put on exploiting the conceptual models developed during analy...
展开
So far, conceptual modeling of Web applications has been used primarily in the upper part of the life cycle, as a driver for system analysis. Little attention has been put on exploiting the conceptual models developed during analysis for application evaluation, maintenance and evolution. This paper illustrates an approach for integrating the use of conceptual models in the lower part of the application life cycle, by exploiting them in quality analysis and usage evaluation. A prototype tool for supporting the described evaluation activities is also presented.
收起
摘要 :
In this paper we perform a study of the image contents of the Chilean web (.cl domain) using automatic feature extraction, content-based analysis and face detection algorithms. In an automated process we examine all .cl websites a...
展开
In this paper we perform a study of the image contents of the Chilean web (.cl domain) using automatic feature extraction, content-based analysis and face detection algorithms. In an automated process we examine all .cl websites and download a large number of the images available (approx. 83,000). Then we extract several visual features (color, texture, shape, etc.) and we perform face detection using novel algorithms. Using this process we semi-automatically characterize the image content of the web in Chile in terms of the detected faces and the visual features obtained automatically. We present statistics of use to anyone concerned with the image content of the web in Chile. Our study is the first one to use content-based tools to determine the image contents of the web.
收起
摘要 :
This paper describes a novel multi-tier architecture for a search engine. Based on observations from query log analysis as well as properties of a ranking formula, we derive a method to tier documents in a search engine. This allo...
展开
This paper describes a novel multi-tier architecture for a search engine. Based on observations from query log analysis as well as properties of a ranking formula, we derive a method to tier documents in a search engine. This allows for increased performance while keeping the order of the results returned, and hence relevance, almost "untouched". The architecture and method have been tested large scale on a carrier-class search engine with 1 billion documents. The architecture gives a huge increase in capacity, and is today in use for a major search engine.
收起
摘要 :
Distribution of streaming media content, including live news, music and videos, is becoming increasingly popular in today's Internet. Traditional client/server architectures are inefficient for distributing streaming media objects...
展开
Distribution of streaming media content, including live news, music and videos, is becoming increasingly popular in today's Internet. Traditional client/server architectures are inefficient for distributing streaming media objects because of the high demands for system resources, especially server and network bandwidth, which severely limit the total number of simultaneous users the system can support. One proposal for improving the scalability of media distribution systems is the use of P2P overlay networks. Although a number of previous work has evaluated different aspects of P2P systems, mainly through simulation, there is a lack of a thorough quantitative analysis of the requirements for server and network resources (i.e., CPU, server and network bandwidth) in actual P2P systems, compared to traditional client/server systems. This work aims at filling this gap by providing experimental results that quantify the savings in server and network resources if a P2P approach is used for distributing live streaming media instead of the traditional client/server approach. Towards this goal, we build an experimental testbed, in a controlled environment, to evaluate actual systems with varying number of clients during periods when the distribution tree is static. A key component of this experimental testbed is a new efficient and scalable application called streaming servent, which can act both as a client and a server, forwarding packets to other clients. We also use simple analytical formulas to evaluate the scalability of our servent application. The experimental results quantify the intuitive better scalability of the P2P architecture. As an example, the total server bandwidth decreases from 15 Mbits/s to 9 Mbits/s (a 40% reduction) if a P2P architecture is used instead of a client/server architecture for live delivery of a given file to 24 clients.
收起
摘要 :
"History Teaches Everything, Including the Future", wrote Alphonse de Lamartine in the nineteen century. Even if history cannot be really considered a predictive science, historical information can successfully be used in many fie...
展开
"History Teaches Everything, Including the Future", wrote Alphonse de Lamartine in the nineteen century. Even if history cannot be really considered a predictive science, historical information can successfully be used in many fields. This paper deals with Web Search Engines and their Query logs, which contain historical information about past usage of such systems. We will present some of the most interesting results obtained in this field by the High Performance Computing Lab in Pisa in collaboration with Research Labs worldwide. The techniques reviewed are mainly focused on enhancing the efficiency of large-scale distributed search systems.
收起
摘要 :
In this paper we process and analyze web search engine query and click data from the perspective of the documents (URL's) selected. We initially define possible document categories and select descriptive variables to define the do...
展开
In this paper we process and analyze web search engine query and click data from the perspective of the documents (URL's) selected. We initially define possible document categories and select descriptive variables to define the documents. The URL dataset is preprocessed and analyzed using some traditional statistical methods, and then processed by the Kohonen SOM clustering technique[5], which we use to produce a two level clustering. The clusters are interpreted in terms of the document categories and variables defined initially. Then we apply the C4.5[9] rule induction algorithm to produce a decision tree for the document category. The objective of the work is to apply a systematic data mining process to click data, contrasting non-supervised (Kohonen) and supervised (C4.5) methods to cluster and model the data, in order to identify document profiles which relate to theoretical user behavior, and document (URL) organization.
收起
摘要 :
So that the software and web measurement field can become a more robust engineering discipline, it is mandatory to start reaching a common agreement between researchers and other stakeholders about primitive concepts such as attri...
展开
So that the software and web measurement field can become a more robust engineering discipline, it is mandatory to start reaching a common agreement between researchers and other stakeholders about primitive concepts such as attribute, metric, measure, measurement and calculation method, scale, elementary and global indicator, calculable concept, among others. There are various useful recently-issued ISO standards related to software quality models, measurement and evaluation processes, however, we observe sometimes a lack of a sound consensus among the same terms in different documents or sometimes absent terms. In this work we present an ontology for software metrics and indicators -based as much as possible on the concepts of those standards, which can be useful to support different assurance processes, methods and tools in addition to being the foundation for our cataloging web system [12]. Without sound and consensuated definitions it is difficult to assure metadata consistency and, ultimately, data values are comparable on the same basis.
收起
摘要 :
Entity Ranking (ER) is a recently emerging search task in Information Retrieval, where the goal is not finding documents matching the query words, but instead finding entities which match types and attributes mentioned in the quer...
展开
Entity Ranking (ER) is a recently emerging search task in Information Retrieval, where the goal is not finding documents matching the query words, but instead finding entities which match types and attributes mentioned in the query. In this paper we propose a formal model to define entities as well as a complete ER system, providing examples of its application to enterprise, Web, and Wikipedia scenarios. Since searching for entities on Web scale repositories is an open challenge as the effectiveness of ranking is usually not satisfactory, we present a set of algorithms based on our model and evaluate their retrieval effectiveness. The results show that combining simple Link Analysis, Natural Language Processing, and Named Entity Recognition methods improves retrieval performance of entity search by over 53% for P@10 and 35% for MAP.
收起